首页> 外文OA文献 >Learning to Design Games: Strategic Environments in Deep Reinforcement Learning
【2h】

Learning to Design Games: Strategic Environments in Deep Reinforcement Learning

机译:学习设计游戏:深层加强的战略环境   学习

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In typical reinforcement learning (RL), the environment is assumed given andthe goal of the learning is to identify an optimal policy for the agent takingactions through its interactions with the environment. In this paper, we extendthis setting by considering the environment is not given, but controllable andlearnable through its interaction with the agent at the same time.Theoretically, we find a dual Markov decision process (MDP) w.r.t. theenvironment to that w.r.t. the agent, and solving the dual MDP-policy pairyields a policy gradient solution to optimizing the parametrized environment.Furthermore, environments with discontinuous parameters are addressed by aproposed general generative framework. While the idea is illustrated by anextended two-agent rock-paper-scissors game, our experiments on a Maze gamedesign task show the effectiveness of the proposed algorithm in generatingdiverse and challenging Mazes against different agents with various settings.
机译:在典型的强化学习(RL)中,假定环境是给定的,学习的目标是通过与环境的交互来确定代理采取行动的最佳策略。在本文中,我们通过考虑没有给出环境来扩展此设置,但通过与代理同时进行交互来控制和学习环境。理论上,我们发现了一个双重马尔可夫决策过程(MDP)w.r.t.那个环境解决双重MDP策略对产生了用于优化参数化环境的策略梯度解决方案。此外,提出的通用生成框架可解决参数不连续的环境。虽然这个想法通过扩展的两人石头剪刀布游戏得以说明,但我们在迷宫游戏设计任务上的实验表明,该算法在针对各种设置的不同角色生成多样化和挑战性迷宫时的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号